## Overview of the archive

This archive contains replication data and source code for the paper
"Confounding adaptation in perennial climate damages: A unified
statistical approach for Brazilian coffee" by James A. Rising.

Included are input and output data (under `data`), analysis scripts
(under `src`), and result figures (under `figures`).  Below, each data
and source file is described, followed by the process for generating
the results. Intermediate outputs are included, so that all scripts
can be run independently as well as part of the whole process.

### Source code

R source code is under the `src` directory. Here is a short summary of
the scripts there:

Data-preparation scripts:

 - `crossval.R`: Evaluate the month range, EDD thresholds, and
   variable inclusion, using leave-one-out state cross-validation.
 - `prepare-observations.R`: Prepare observations dataset with coffee
   outcomes and weather.

Modeling scripts:

 - `model-combine-simple.R`: Combine municipality-level results in a
   non-Bayesian way
 - `model-combine.R`: Combine municipality-level calibrations within
   hierarchical Bayesian setup.
 - `model-report.R`: Produce the master tables comparing OLS and
   Bayesian models.
 - `model-single-nos.R`: Fit the econometric-structural model to
   each municipality, with no harvest selection.
 - `model-single-ols.R`: Fit the OLS model using the Bayesian,
   single-region approach.
 - `model-single-rational-nos.R`: Fit the econometric-structural model
   to each municipality, assuming rational expectations and no harvest
   selection.
 - `model-single-rational.R`: Fit the econometric-structural model to
   each municipality, with rational price expectations.
 - `model-single.R`: Fit the econometric-structural model to each
   municipality.
 - `simulate-models.R`: Perform simulations using age-based yields.

Figure-generation scripts:

 - `fig-basics.R`: Prepare figures describing the data.
 - `fig-costkdds.R`: Produce the relationship between parameter values
   and cost-EDD sensitivity.
 - `fig-crossval.R`: Produce graphs describing the cross-validation
   results.
 - `fig-delays.R`: Produce the distributed lags figure and table.
 - `fig-gddmod.R`: Produce the GDD-EDD dose-response curves.
 - `fig-munimean.R`: Produce figures describing municipality-level
   coefficients.
 - `fig-yielddist.R`: Comparison of yield distribution and KDD effect
   graphs.
 - `price-compare.R`: Compare various measures of coffee prices.
   
Table generation scripts:

 - `tbl-allmunis-topmunis.R`: Produce regression tables for main and
   alternative spec.
 - `tbl-data.R`: Extract summary statistics from the data.
 - `tbl-eddsens.R`: Perform the main specification under different EDD
   thresholds.
 - `tbl-elasticity.R`: Calculate planting elasticity.
 - `tbl-errorcalib.R`: Calibration error estimatation.
 - `tbl-pricemod.R`: Perform regressions to understand effect of
   prices.
 - `tbl-prodharvmods.R`: Regressions that compare effects on yield,
   produce, and harvest.
 - `tbl-selection.R`: Test for selection bias.

Other result-producing scripts:

 - `txt-varyields.R`: Report various yield statistics from data and
   model fits.

Supporting code:

 - `conley.R`: Conley SEs corrected for spatial autocorrelation.
 - `cpp-functions.cpp`: C++ functions to support conley.R
 - `felm-tools.R`: Functions for managing results from the lfe::felm
   fixed-effects linear model fitting function.
 - `load-single.R`: Prepares results from model-single.R for
   model-combine.R.
 - `load.R`: Load the core regression data, and set up other features
   of the main analyses.
 - `simualte-yields.R`: Age-based yield data from Arak (1967) and
   checks.

### Data files

Source and output data files are both stored in the `data`
directory. Here is a short summary of the files there:

External coffee production data:

 - `brazil-coffee.csv`: Contains coffee area data for Arabica and
   Robusta from Instituto Brasileiro de Geografia e Estatistica
   (IBGE).
 - `coffee-harvested-total.csv`: Contains coffee harvested data across
   all varieties from IBGE.
 - `coffee-produced-total.csv`: Contains coffee production data across
   all varieties from IBGE.
 - `coffee-produce.csv`: Contains coffee production data for Arabica
   and Robusta from IBGE.
 - `weather-brazil-sachs2.csv`: Weather data from global coffee study,
   https://scholarship.law.columbia.edu/sustainable_investment_staffpubs/53/.
 - `brazil-maxpoints.csv` Geographic points of maximum production by
   municipality.

External price data:

 - `3a - Prices paid to growers.xlsx`: Prices paid to growers from
   ICO, 1990 - 2019.
 - `CMO-Historical-Data-Annual.xlsx`: World Bank Commodity Price Data (The Pink Sheet).
 - `London Robusta Coffee Futures Historical Data.csv`: London Robusta
 Coffee Futures from https://www.investing.com/commodities/london-coffee-historical-data
 - US Coffee C Futures Historical Data.csv`: Coffee C Futures from
   https://www.investing.com/commodities/us-coffee-c-historical-data
 - `growprices.csv`: Prices paid to growers from
   ICO, 1965 - 2013.

Outputs from Stage 0 (dataset preparation):

 - `observations-coffee.csv`: All output production metrics by
   municipality and year.
 - `observations-full-all.csv`: Complete panel with all
   municipalities-year combinations, with NAs when output metrics unavailable.
 - `observations-full.csv`: Unbalanced panel of municipalities and
years, with observations missing when output metrics unavailable.

Outputs from Stage 1 (OLS regressions):

 - `crossval.csv`: Cross-validation metric (RMSE) for all model specifications.

Outputs from Stage 2 (municipality fitting):

 - `allfits-SUFFIX.csv`: These contain Bayesian MCMC draws for a given
 model calibration. Each row is a MCMC draw for a given region, with
 columns for the parameters. Suffixes described below.
 - `sumstats-SUFFIX.csv`: These contain summary statistics for each
   parameter. Each row is a description of the distribution of a given
   parameter in a given region. Suffixes described below.

Outputs from Stage 3 (ierarchical meta-analysis):

 - `allvals-SUFFIX.RData`: These contain the inputs to the
 hierarchical Bayesian modeling process. Suffixes described below.
 - `combined-betagamma-SUFFIX.RData`: These contain the MCMC draws
   from the hierarchical Bayesian modeling process.
 - `combined-muniinfo-SUFFIX.RData`: These contain region-level data
   corresponding to a given hierarchical Bayesian modeling process.
 
Suffixes for models above:

Stage 2 and 3 output files are named with a suffix to identify the
assumptions of the fitted model. Here is the list of suffixes used:

 - `arabica-eq24-orig`: Full model, under the main yield specification
   and Nerlove price expectations, for Arabica-only municipalities.
 - `arabica-eq24rational-orig`: Full model, under the main yield specification
   and rational price expectations, for Arabica-only municipalities.
 - `eq24-minx`: Full model, under the alternative yield specification
   and Nerlove price expectations.
 - `eq24-orig`: Full model, under the main yield specification
   and Nerlove price expectations.
 - `eq24-orig2`: Full model, under the main yield specification and
   Nerlove price expectations; second calibration.
 - `eq24rational-orig`: Full model, under the main yield specification
   and rational price expectations.
 - `nos-orig`: Model without harvest selection, under the main yield
   specification and Nerlove price expectations.
 - `nosrational-orig`: Model without harvest selection, under the main
   yield specification and rational price expectations.
 - `ols-orig`: OLS-equivalent yield-only model, under the main yield
   specification.
 - `arabica-eq24-orig`: Full model, under the main yield specification
   and Nerlove price expectations, for Robusta-only municipalities.
 - `arabica-eq24rational-orig`: Full model, under the main yield specification
   and rational price expectations, for Robusta-only municipalities.

## Process for generating results

The working directory for scripts is set at the top of each script,
and it is assumed that the archive directories are under
`~/research/coffee-dataverse`. Here, `~` refers to the user's
directory.

### Stage 0 - Prepare the dataset

Note that these steps (like the following stages) can be skipped,
since the resulting dataset is included in archive.

Core process:

 - Run `prepare-observations.R`. Writes `observations-coffee.csv` and
   either `observations-full.csv` (if `do.allyears` is false) or
   `observations-full-all.csv` (if `do.allyears` is true). Both of
   these files are needed for steps below.
   
Additional results:

- Run `price-compare.R`. Writes
   `figures/data/price-compare-time.pdf`,
   `figures/data/price-compare-matrix.pdf`,
   `figures/data/price-compare-matrix-arabica.pdf`,
   `figures/data/price-compare-matrix-robusta.pdf`.

### Stage 1 - OLS regression results

Core results:

 - Run `tbl-allmunis-topmunis.R`. Prints the base specification
   regression tables.
 - Run `fig-gddmod.R`. Generates figures
   `regs/gddmod-single-SUFFIX.pdf`, `regs/gddmod-SUFFIX.pdf`, and
   `regs/gddmod-hist.pdf`.
 - Run `tbl-prodharvmods.R`. Prints the comparison of output effects.
 - Run `tbl-pricemod.R`. Prints the price effect regressions.
 - Run `fig-delays.R`. Generates `figures/regs/delays.pdf`,
   `tables/regs/delays-w2.tex`, and `tables/regs/delays-con.tex` (note
   that a `tables` directory must be created for these to be written).

Additional results:

 - Run `crossval.R`. Writes `data/crossval.csv`
 - Run `fig-crossval.R`. Generates figures `xval/spec-scale-big.pdf`,
`xval/preferred-vs-months.png`, `xval/preferred-vs-gedd.png`, and
`xval/spec-scale.pdf`.
 - Run `tbl-data.R`. Prints table with summary statistics.
 - Run `fig-basics.R`. Generated figures `data/concavity.pdf`,
   `data/preds-gdds.pdf`, `data/preds-edds.pdf`,
   `data/preds-tmin.pdf`, `data/preds-prcp.pdf`, and
   `data/preds-yields.pdf`.
 - Run `tbl-eddsens.R`. Prints regression table of EDD threshold
   sensitivity test.
 - Run `tbl-selection.R`. Prints regression table testing for
   selection bias.

## Stage 2: Fitting the municipality-level parameters

Core process:

 - Run `model-single.R`. This takes a very long time to complete. It
 produces `sumstats-SUFFIX.csv` and `allfits-SUFFIX.csv` files.
 - Run `model-single-rational.R`. This takes a very long time to
 complete.  It produces `sumstats-eq24rational-SUFFIX.csv` and
 `allfits-eq24rational-SUFFIX.csv` files.
 - Run `model-single-rational-nos.R`. This takes a very long time to
   complete.   It produces `sumstats-nosrational-SUFFIX.csv` and
   `allfits-nosrational-SUFFIX.csv` files.
 - Run `model-single-nos.R`. This takes a long time to complete.  It
   produces `sumstats-nos-SUFFIX.csv` and `allfits-nos-SUFFIX.csv`
   files.
 - Run `model-single-ols.R`. This takes a long time to complete.  It
   produces `sumstats-ols-SUFFIX.csv` and `allfits-ols-SUFFIX.csv`
   files.

Above, `SUFFIX` may be `orig` or `minx`.

Additional results:

 - Run `txt-varyields.R`. Prints values used in the text.

## Stage 3: Hierarchical meta-analysis

Core results:

 - Run `model-combine.R`. This takes a long time to
   complete. Generates `allvals-SUFFIX.RData`,
   `combined-muniinfo-SUFFIX.csv`, and
   `combined-betagamma-SUFFIX.RData`.
 - Run `tbl-elasticity.R`. Prints out planting elasticity estimate.
 - Run `model-report.R`. Prints the model comparison tables.
 - Run `fig-munimean.R`. Generates the municipality-level variation
   graphs.
 - Run `fig-yielddist.R`. Generates the yield distribution comparison
   and KDD effect graphs.
 - Run `model-simulate.R`. Generates the simulation and die-off
   probability graphs.

Additional results:

- Run `model-combine-simple.R`. This takes a long time to
   complete. Generates `taupooling.pdf`.
 - Run `fig-costkdds.R`. Generates `figures/alts/costkdds.pdf`.
 - Run `tbl-errorcalib.R`. Prints tables for error calibration section.
